28 research outputs found

    Development of Data-Driven Dispatching Heuristics for Heterogeneous HPC Systems

    Get PDF
    Nell’ambito dei sistemi High-Performance Computing, l'uso di euristiche di dispatching efficaci, per lo scheduling e l'allocazione dei jobs in arrivo, è fondamentale al fine di ottenere buoni livelli di Quality of Service. In questo elaborato ci concentreremo sul design e l’analisi di euristiche di allocazione delle risorse, che saranno progettate per sistemi HPC eterogenei, nei quali i nodi possono essere equipaggiati con diverse tipologie di unità di elaborazione. Impiegheremo poi euristiche data-driven per la predizione della durata dei jobs, e valuteremo il tutto dal punto di vista del throughput di sistema. Considereremo in particolare Eurora, un sistema HPC eterogeneo realizzato da CINECA, oltre che un workload catturato dal relativo log di sistema, contenente jobs reali inviati dagli utenti. Tutto ciò è stato possibile grazie ad AccaSim, un simulatore di sistemi HPC sviluppato nel Dipartimento di Informatica - Scienza e Ingegneria (DISI) dell’Università di Bologna, ed al quale si è contribuito in modo sostanziale. Quest’elaborato mostra che l’impatto di diverse euristiche di allocazione sul throughput di un sistema HPC eterogeneo non è trascurabile, con variazioni in grado di raggiungere picchi di un ordine di grandezza, e più pronunciate considerando brevi intervalli temporali, dell'ordine dei mesi. Abbiamo inoltre osservato che l’impiego di euristiche per la predizione della durata dei jobs è di grande beneficio al throughput su tutte le euristiche di allocazione, e specialmente su quelle che integrano in maniera più profonda tali elementi data-driven. Infine, l’analisi effettuata ha permesso di caratterizzare integralmente il sistema Eurora ed il relativo workload, permettendoci di comprendere al meglio gli effetti su di esso dei diversi metodi di dispatching, nonché di estendere le nostre considerazioni anche ad altre classi di sistemi

    Online Fault Classification in HPC Systems through Machine Learning

    Full text link
    As High-Performance Computing (HPC) systems strive towards the exascale goal, studies suggest that they will experience excessive failure rates. For this reason, detecting and classifying faults in HPC systems as they occur and initiating corrective actions before they can transform into failures will be essential for continued operation. In this paper, we propose a fault classification method for HPC systems based on machine learning that has been designed specifically to operate with live streamed data. We cast the problem and its solution within realistic operating constraints of online use. Our results show that almost perfect classification accuracy can be reached for different fault types with low computational overhead and minimal delay. We have based our study on a local dataset, which we make publicly available, that was acquired by injecting faults to an in-house experimental HPC system.Comment: Accepted for publication at the Euro-Par 2019 conferenc

    From Facility to Application Sensor Data: Modular, Continuous and Holistic Monitoring with DCDB

    Full text link
    Today's HPC installations are highly-complex systems, and their complexity will only increase as we move to exascale and beyond. At each layer, from facilities to systems, from runtimes to applications, a wide range of tuning decisions must be made in order to achieve efficient operation. This, however, requires systematic and continuous monitoring of system and user data. While many insular solutions exist, a system for holistic and facility-wide monitoring is still lacking in the current HPC ecosystem. In this paper we introduce DCDB, a comprehensive monitoring system capable of integrating data from all system levels. It is designed as a modular and highly-scalable framework based on a plugin infrastructure. All monitored data is aggregated at a distributed noSQL data store for analysis and cross-system correlation. We demonstrate the performance and scalability of DCDB, and describe two use cases in the area of energy management and characterization.Comment: Accepted at the The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 201

    Veno-occlusive disease nurse management: Development of a dynamic monitoring tool by the GITMO nursing group

    Get PDF
    Veno-occlusive disease (VOD) is a complication arising from the toxicity of conditioning regimens that have a significant impact on the survival of patients who undergo stem cell transplantation. There are several known risk factors for developing VOD and their assessment before the start of conditioning regimens could improve the quality of care. Equally important are early identification of signs and symptoms ascribable to VOD, rapid diagnosis, and timely adjustment of support therapy and treatment. Nurses have a fundamental role at the stages of assessment and monitoring for signs and symptoms; therefore, they should have documented skills and training. The literature defines nurses' areas of competence in managing VOD, but in the actual clinical practice, this is not so clear. Moreover, there is an intrinsic difficulty in managing VOD due to its rapid and often dramatic evolution, together with a lack of care tools to guide nurses. Through a complex evidence-based process, the Gruppo Italiano per il Trapianto di Midollo Osseo (GITMO), cellule staminali emopoietiche e terapia cellulare nursing board has developed an operational flowchart and a dynamic monitoring tool applicable to haematopoietic stem cell transplantation patients, whether they develop this complication or not

    AccaSim: An HPC simulator for workload management

    No full text
    We present AccaSim, an HPC simulator for workload management. Thanks to the scalability and high customizability features of AccaSim, users can easily represent various real HPC system resources, develop dispatching methods and carry out large experiments across different workload sources. AccaSim is thus an attractive tool for conducting controlled experiments in HPC dispatching research

    Tuning of Hydrogel Architectures by Ionotropic Gelation in Microfluidics: Beyond Batch Processing to Multimodal Diagnostics

    No full text
    Microfluidics is emerging as a promising tool to control physicochemical properties of nanoparticles and to accelerate clinical translation. Indeed, microfluidic-based techniques offer more advantages in nanomedicine over batch processes, allowing fine-tuning of process parameters. In particular, the use of microfluidics to produce nanoparticles has paved the way for the development of nano-scaled structures for improved detection and treatment of several diseases. Here, ionotropic gelation is implemented in a custom-designed microfluidic chip to produce different nanoarchitectures based on chitosan-hyaluronic acid polymers. The selected biomaterials provide biocompatibility, biodegradability and non-toxic properties to the formulation, making it promising for nanomedicine applications. Furthermore, results show that morphological structures can be tuned through microfluidics by controlling the flow rates. Aside from the nanostructures, the ability to encapsulate gadolinium contrast agent for magnetic resonance imaging and a dye for optical imaging is demonstrated. In conclusion, the polymer nanoparticles here designed revealed the dual capability of enhancing the relaxometric properties of gadolinium by attaining Hydrodenticity and serving as a promising nanocarrier for multimodal imaging applications
    corecore